Skip to content

Handle invalid UTF-8 in memory markdown#567

Open
googs1025 wants to merge 1 commit into
zilliztech:mainfrom
googs1025:utf8-markdown-fix
Open

Handle invalid UTF-8 in memory markdown#567
googs1025 wants to merge 1 commit into
zilliztech:mainfrom
googs1025:utf8-markdown-fix

Conversation

@googs1025

Copy link
Copy Markdown

Summary

  • Add a UTF-8 reader that warns and replaces invalid byte sequences.
  • Use it for indexing, expand, and maintenance journal reads.
  • Add regression coverage for index, expand, and maintenance paths.

Test Plan

  • .venv/bin/python -m pytest tests/test_core_encoding.py tests/test_cli_error_handling.py tests/test_maintenance.py -q
  • .venv/bin/python -m pytest tests/test_embed_batching.py tests/test_scanner.py -q
  • .venv/bin/python -m py_compile src/memsearch/io.py src/memsearch/core.py src/memsearch/cli.py src/memsearch/maintenance.py tests/test_core_encoding.py tests/test_cli_error_handling.py tests/test_maintenance.py

Note: full pytest was attempted in a temporary pip-built environment, but that environment installed pymilvus/milvus-lite 3.0 instead of uv.lock versions and failed tests/test_store.py::test_collection_description due to dependency behavior. uv sync --locked --dev was attempted but failed downloading numpy due to network timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant